Low Complexity Spectral Imputation for Noise Robust Speech Recognition
نویسنده
چکیده
of the Thesis Low Complexity Spectral Imputation for Noise Robust Speech Recognition by Julien van Hout Master of Science in Electrical Engineering University of California, Los Angeles, 2012 Professor Abeer Alwan, Chair With the recent push of Automatic Speech Recognition (ASR) capabilities to mobile devices, the user’s voice is now recorded in environments with a potentially high level of background noise. To reduce the sensitivity of ASR performance to these distortions, techniques have been proposed that preprocess the speech waveforms to remove noise effects while preserving discriminative speech information. At the expense of increased complexity, recent algorithms have significantly improved recognition accuracy but remain far from human performance in highly noisy environments. With a concern for both complexity and performance, this thesis investigated ways to reduce the corruptive effect of noise by directly weighting the powerspectrum (SMF pow ) or log-spectrum (SMF log ) of speech by a mask whose values are within [0,1] and are indexed on the local relative prominence of speech and noise energy. Additional contributions include a low-complexity approach to mask estimation and the use of spectral flooring for matching the dynamic range of clean and noisy spectra. These two techniques are evaluated on two standard noisy ASR databases: the Aurora-2 connected digits recognition task with 11 ii words, and the Aurora-4 continuous speech recognition task with 5000 words. On the Aurora-2 task, the SMF log algorithm leads to state-of-the-art performance, with a limited complexity compared to existing techniques. The SMF
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملMissing Feature Imputation of Log-spectral Data for Noise Robust Asr
In this paper, we present a missing feature (MF) imputation algorithm for log-spectral data with applications to noise robust ASR. Drawing from previous work [1], we adapt the previously proposed spectrographic reconstruction solution to the liftered log-spectral domain by introducing log-spectral flooring (LS-FLR). LS-FLR is shown to be an efficient and effective noise robust feature extractio...
متن کاملState based imputation of missing data for robust speech recognition and speech enhancement
Within the context of continuous-density HMM speech recognition in noise, we report on imputation of missing time-frequency regions using emission state probability distributions. Spectral subtraction and local signal–to– noise estimation based criteria are used to separate the present from the missing components. We consider two approaches to the problem of classification with missing data: ma...
متن کاملMask estimation in non-stationary noise environments for missing feature based robust speech recognition
In missing feature based automatic speech recognition (ASR), the role of the spectro-temporal mask in providing an accurate description of the relationship between target speech and environmental noise is critical for minimizing the degradation in ASR word accuracy (WAC) as the signal-to-noise ratio (SNR) decreases. This paper demonstrates the importance of accurate characterization of instanta...
متن کاملRobust automatic speech recognition with missing and unreliable acoustic data
Human speech perception is robust in the face of a wide variety of distortions, both experimentally applied and naturally-occurring. In these conditions, state-of-the-art automatic speech recognition technology fails. This paper describes an approach to robust ASR which acknowledges the fact that some spectro-temporal regions will be dominated by noise. For the purposes of recognition, these re...
متن کامل